Comparison of Different Distance Measures on Hierarchical Document Clustering in 2-Pass Retrieval
نویسندگان
چکیده
Hierarchic document clustering has been applied to search results (query-specific clustering ) on the grounds of its potential improved effectiveness compared both to that of static clustering and of conventional inverted file search (IFS). In this paper we review and compare the effects of seven different measures of similarity among documents in hierarchic query specific clustering. We have conducted a number of experiments using OHSUMED document collection. The Experiments seems to indicate that the choice of similarity measure effects positively or negatively the quality of clustering. Key-Words: cluster-based search; Hierarchical clustering; Distance measures
منابع مشابه
An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملComprehensive Survey on Distance / Similarity Measures between Probability Density Functions
Distance or similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that are applicable to compare two probability density functions, pdf in short, are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering ...
متن کاملHierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics
This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...
متن کاملAn Approach for Document Clustering using Agglomerative Clustering and Hebbian-type Neural Networkx
Clustering is a useful method that categorizes a large quantity of unordered text documents into a small number of meaningful and coherent collections, thereby providing a basis for instinctive and informative navigation and browsing mechanisms. Different type of distance functions and similarity measures have been used for clustering, such as squared, cosine similarity, Euclidean distance and ...
متن کاملDocument Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis
Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004